Skip to main content

Basics

Why do we need Measure Theory?​

Measure theory is a branch of mathematics that provides a rigorous foundation for dealing with concepts such as length, area, and volume, and it's essential in probability theory for handling more complex and abstract probability spaces. Classical probability, which deals with finite or countably infinite sample spaces with equally likely outcomes, can fail or be insufficient in several scenarios, particularly when dealing with continuous random variables or uncountably infinite sample spaces. Here's a simple example to illustrate a situation where classical probability might fail:

Example: The Dartboard Problem

Scenario: Imagine throwing a dart at a standard dartboard. You are a skilled dart player, so you can hit the board, but the exact point where the dart lands is random.

Classical Probability Approach: In classical probability, you might try to calculate the probability of hitting a specific point on the dartboard. However, the dartboard has an infinite number of points, and in classical probability, this leads to a paradox. If you assign an equal probability to each point (since there's no reason to prefer one point over another), each point would have a probability of zero (since you can't divide a finite probability evenly among an infinite number of outcomes). But this implies that the probability of hitting the board at all is zero, which is obviously not true.

Measure Theory Approach: Measure theory comes to the rescue by introducing the concept of Lebesgue measure, which can be thought of as an extension of length, area, and volume to more complex sets. In the dartboard example, instead of assigning probabilities to individual points, we use measure theory to assign probabilities to regions or intervals. For instance, we can calculate the probability of the dart landing in a certain circular ring or sector of the dartboard. This approach acknowledges that while each individual point has zero probability, sets of points (like intervals or regions) can have a non-zero probability.

Bonus: I've developed a Jupyter notebook that offers a detailed and engaging derivation of the probability density function of the normal distribution using Dart experiment. I highly recommend checking it out.

Formalization​

Probability Spaces​

Definition​

A probability space is a triple (Ω,F,P)(\Omega, \mathcal{F}, P) consisting of:

  • The sample space Ω\Omega, which is simply an arbitrary non-empty set
  • A σ\boldsymbol{\sigma}-algebra F⊆2Ω\mathcal{F} \subseteq 2^{\Omega}, which is a set of subsets of Ω\Omega satisfying
    • Ω∈F\Omega \in \mathcal{F}
    • F\mathcal{F} is closed under complements: if A∈FA \in \mathcal{F}, then Ac∈FA^c \in \mathcal{F}
    • F\mathcal{F} is closed under countable unions: if Ai∈FA_i \in \mathcal{F} for i∈Ni \in \mathbb{N}, then (⋃i=1∞Ai)∈F(\bigcup_{i=1}^{\infty} A_i) \in \mathcal{F}
  • A probability measure P:F→[0,1]P : \mathcal{F} \rightarrow [0, 1] satisfying:
    • The measure of the entire sample space equals one: P(Ω)=1P(\Omega) = 1
    • PP is countably additive: if {Ai}i=1∞\{A_i\}_{i=1}^{\infty} is a countable collection of pairwise disjoint set Ai∈FA_i \in \mathcal{F}, then P(⋃i=1∞Ai)=∑i=1∞P(Ai)P\left(\bigcup_{i=1}^{\infty} A_i\right) = \sum_{i=1}^{\infty} P(A_i)

Proposition:

Given a probability space (Ω,F,P)(\Omega, \mathcal{F}, P) and an event B∈FB \in \mathcal{F} with P(B)>0P(B) > 0, prove that the function P(⋅∣B):F→[0,1]P(\cdot | B): \mathcal{F} \to [0, 1] defined by

P(A∣B)=P(A∩B)P(B)P(A | B) = \frac{P(A \cap B)}{P(B)}

is a probability measure on (Ω,F)(\Omega, \mathcal{F}).

Proof:

  1. Non-negativity: By the definition of conditional probability,

    P(A∣B)=P(A∩B)P(B)P(A | B) = \frac{P(A \cap B)}{P(B)}

    Since P(B)>0P(B) > 0 and probabilities are always non-negative, P(A∩B)P(A \cap B) is also non-negative, hence the ratio is non-negative. Therefore, P(A∣B)≥0P(A | B) \geq 0 for every A∈FA \in \mathcal{F}.

  2. Normalization: Consider A=ΩA = \Omega. Since B⊆ΩB \subseteq \Omega, B∩Ω=BB \cap \Omega = B. Thus,

    P(Ω∣B)=P(Ω∩B)P(B)=P(B)P(B)=1P(\Omega | B) = \frac{P(\Omega \cap B)}{P(B)} = \frac{P(B)}{P(B)} = 1
  3. Countable additivity: Let {Ai}i=1∞\{A_i\}_{i=1}^\infty be a countable collection of mutually exclusive sets in F\mathcal{F}. Then,

    P(⋃i=1∞Ai∣B)=P((⋃i=1∞Ai)∩B)P(B)P\left(\bigcup_{i=1}^{\infty} A_i | B\right) = \frac{P\left(\left(\bigcup_{i=1}^{\infty} A_i\right) \cap B\right)}{P(B)}

    Since AiA_i are mutually exclusive, Ai∩BA_i \cap B and Aj∩BA_j \cap B are mutually exclusive for i≠ji \neq j. By the countable additivity of the probability measure PP, we have:

    P(⋃i=1∞(Ai∩B))=∑i=1∞P(Ai∩B)P\left(\bigcup_{i=1}^{\infty} (A_i \cap B)\right) = \sum_{i=1}^{\infty} P(A_i \cap B)

    Substituting back into the conditional probability gives us:

    P(⋃i=1∞Ai∣B)=∑i=1∞P(Ai∩B)P(B)=∑i=1∞P(Ai∩B)P(B)=∑i=1∞P(Ai∣B)P\left(\bigcup_{i=1}^{\infty} A_i | B\right) = \frac{\sum_{i=1}^{\infty} P(A_i \cap B)}{P(B)} = \sum_{i=1}^{\infty} \frac{P(A_i \cap B)}{P(B)} = \sum_{i=1}^{\infty} P(A_i | B)

    Therefore, P(⋅∣B)P(\cdot | B) satisfies all three axioms of a probability measure, and thus the proposition is proved.■\blacksquare